Optimal size, freshness and time-frame for voice search vocabulary
نویسندگان
چکیده
In this paper, we investigate how to optimize the vocabulary for a voice search language model. The metric we optimize over is the out-of-vocabulary (OoV) rate since it is a strong indicator of user experience. In a departure from the usual way of measuring OoV rates, web search logs allow us to compute the per-session OoV rate and thus estimate the percentage of users that experience a given OoV rate. Under very conservative text normalization, we find that a voice search vocabulary consisting of 2 to 2.5M words extracted from 1 week of search query data will result in an aggregate OoV rate of 0.01; at that size, the same OoV rate will also be experienced by 90% of users. The number of words included in the vocabulary is a stable indicator of the OoV rate. Altering the freshness of the vocabulary or the duration of the time window over which the training data is gathered does not significantly change the OoV rate. Surprisingly, a significantly larger vocabulary (approx. 10 million words) is required to guarantee OoV rates below 0.01 (1%) for 95% of the users.
منابع مشابه
OPTIMAL DESIGN OF STEEL MOMENT FRAME STRUCTURES USING THE GA-BASED REDUCED SEARCH SPACE (GA-RSS) TECHNIQUE
This paper proposes a GA-based reduced search space technique (GA-RSS) for the optimal design of steel moment frames. It tries to reduce the computation time by focusing the search around the boundaries of the constraints, using a ranking-based constraint handling to enhance the efficiency of the algorithm. This attempt to reduce the search space is due to the fact that in most optimization pro...
متن کاملSearching for optimal frame patterns in an integrated TDMA communication system using mean field annealing
In an integrated time-division multiple access (TDMA) communication system, voice and data are multiplexed in time to share a common transmission link in a frame format in which time is divided into slots. A certain number of time slots in a frame are allocated to voice and the rest are used to transmit data. Maximum data throughput can be achieved by searching for the optimal configuration(s) ...
متن کاملEasyCmd: Navigation by Voice Commands
In this paper we present a system named EasyCmd that provides voice navigation on the desktop of Microsoft Window 9x system. Speech recognition engine for EasyCmd is much similar to that for dictation machine. Statistical Knowledge Based Frame Synchronous Search algorithm (SKBFSS) and Word Search Tree (WST) technologies are applied for acoustic decoding. Recognition Score Gap (RSG) is used for ...
متن کاملOptimal Placement and Sizing of TCSC & SVC for Improvement Power System Operation using Crow Search Algorithm
Abstract: The need for more efficient power systems has prompted the use of a new technologies includes Flexible AC transmission system (FACTS) devices. FACTS devices provides new opportunity for controlling the line power flow and minimizing losses while maintaining the bus voltages within a permissible limit. In this thesis a new method is proposed for optimal placement and sizing of Thyristo...
متن کاملApplication of Frame Semantics to Teaching Seeing and Hearing Vocabulary to Iranian EFL Learners
A term in one language rarely has an absolute synonymous meaning in the same language; besides, it rarely has an equivalent meaning in an L2. English synonyms of seeing and hearing are particularly grammatically and semantically different. Frame semantics is a good tool for discovering differences between synonymous words in L2 and differences between supposed L1 and L2 equivalents. Vocabulary ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1210.8436 شماره
صفحات -
تاریخ انتشار 2012